Enterprise Database Systems
Data Wrangling with Python/Pandas
Data Wrangling with Pandas: Advanced Features
Data Wrangling with Pandas: Visualizations and Time-Series Data
Data Wrangling with Pandas: Working with Series & DataFrames
Final Exam: Data Wrangler

Data Wrangling with Pandas: Advanced Features

Course Number:
it_dsdwppdj_03_enus
Lesson Objectives

Data Wrangling with Pandas: Advanced Features

  • Course Overview
  • perform grouping and aggregations on data
  • work with multiple, hierarchical indexes
  • specify grouping and aggregations with multiple indexes
  • perform general user-defined aggregations
  • extract subsets of data using filtering
  • identify kinds of masking operations
  • troubleshoot data with duplicates
  • identify how categorical data differs from continuous
  • perform filtering operations on categorical data
  • recognize default and custom indexes and reindex DataFrames
  • perform filtering operations, drop duplicate data, and work with categories

Overview/Description

This course uses Python, the preferred programming language for data science, to explore Pandas, a popular Python library, and is a part of the open-source PyData stack. In this 11-video Skillsoft Aspire course, learners will use Pandas DataFrame to perform advanced category grouping, aggregations, and filtering operations. You will see how to use Pandas to retrieve a subset of your data by performing filtering operations both on rows, as well as columns. You will perform analysis on multilevel data by using the GROUPBY operation on Dataframe. You will then learn to use data masking or data obfuscation to protect classified or commercially sensitive data. Learners will work with duplicate data, an important part of data cleaning. You will examine the two broad categories of data continuous data which comprise of a continuous range of value, and categorical data has discrete, finite values. Pandas automatically generates indexes for each of our DataFrame rows, and here you will learn to different types of reindexing operations on Dataframe.



Target

Prerequisites: none

Data Wrangling with Pandas: Visualizations and Time-Series Data

Course Number:
it_dsdwppdj_02_enus
Lesson Objectives

Data Wrangling with Pandas: Visualizations and Time-Series Data

  • Course Overview
  • load and explore the dataset used for visualization
  • plot pie charts, box plots, and scatter plots using Pandas
  • identify and work with time-series data
  • calculate deltas and percentage returns in stock prices
  • define time deltas and date ranges in Pandas
  • clean missing data in mismatched DataFrames
  • identify string data stored in DataFrames
  • perform advanced manipulations on string data
  • change column values by applying functions
  • transform data with user-defined functions
  • transform all columns in a DataFrame
  • recall how to plot visuals and transform column values

Overview/Description

This 12-video Skillsoft Aspire course uses Python, the preferred programming language for data science, to explore data in Pandas with popular chart types such as the bar graph, histogram, pie chart, and box plot. Discover how to work with time series and string data in data sets. Pandas represents data in a tabular format which makes it easy to perform data manipulation, cleaning, and data exploration, all important parts of any data engineer's toolkit. You will learn how to use Matplotlib, a multiplatform data visualization library built on NumPy, the Python library that is used to work with multidimensional data. Learners will use Panda's features to work with specific kinds of data such as time series data and stream data. This course uses a real-world demonstration using Pandas to analyze stock market returns for Amazon. Finally, you will learn how to make data transformations to clean, format, and transform the data into a useful form for further analysis.



Target

Prerequisites: none

Data Wrangling with Pandas: Working with Series & DataFrames

Course Number:
it_dsdwppdj_01_enus
Lesson Objectives

Data Wrangling with Pandas: Working with Series & DataFrames

  • Course Overview
  • install and work with Pandas
  • create and configure Pandas Series objects
  • perform data wrangling operations on Series objects
  • use appending and sorting operations on Series objects
  • create and configure Pandas DataFrame objects
  • perform indexing operations on DataFrames
  • identify and troubleshoot missing data
  • work with aggregations on columns
  • perform statistical operations on DataFrames
  • recall basic concepts and instantiate Series and DataFrame objects

Overview/Description

Pandas, a popular Python library, is part of the open-source PyData stack. In this 10-video Skillsoft Aspire course, you will learn that Pandas represents data in a tabular format which makes it easy and intuitive to perform data manipulation, cleaning, and exploration. You will use Python's DataFrame a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). To take this course, you should already be familiar with Python programming language; all code writing is in Jupyter notebooks. You will work with basic Pandas data structures, Pandas Series objects representing a single column of data which can store numerical values, strings, Booleans, and more complex data types. Learn how to use Pandas DataFrame, which represents data in table form. Finally, learn to append and sort series values, add missing data, add columns, and aggregate data in a DataFrame. The closing exercise involves instantiating a Pandas Series object by using both a list and a dictionary; changing the Series index to something other than default value; and practicing sorting Series values in place.



Target

Prerequisites: none

Final Exam: Data Wrangler

Course Number:
it_fedads_02_enus
Lesson Objectives

Final Exam: Data Wrangler

  • apply a group by transformation to aggregate with a conditional value
  • apply grouping and aggregation operations on a DataFrame to analyze categories of data in a dataset
  • build and run the application and confirm the output using HDFS from both the command line and the web application
  • change column values by applying functions
  • change date formats to the ISO 8601 standard
  • code up a Combiner for the MapReduce application and configure the Driver to use it for a partial reduction on the Mapper nodes of the cluster
  • compare managed and external tables in Hive and how they relate to the underlying data
  • configure and test PyMongo in a Python program
  • configure the Reducer and the Driver for the inverted index application
  • create and analyze categories of data in a dataset using Windows
  • Create and configure Pandas dataFrame objects
  • Create and configure pandas series object
  • create and instantiate a directed acyclic graph in Airflow
  • create a Spark DataFrame from the contents of a CSV file and apply some simple transformations on the DataFrame
  • create the driver program for the MapReduce application
  • define and run a join query involving two related tables
  • define a vehicle type that can be used to represent automobiles to be stored in a Java PriorityQueue
  • define the Mapper for a MapReduce application to build an inverted index from a set of text files
  • define what a window is in the context of Spark DataFrames and when they can be used
  • demonstrate how to ingest data using Sqoop
  • describe data ingestion approaches and compare Avro and Parquet file format benefits
  • describe the beneficial features that we can achieve using serverless and lambda architectures
  • describe the data processing strategies provided by MapReduce V2, Hive, Pig, and Yam for processing data with data lakes
  • describe the different primitive and complex data types available in Hive
  • extract subsets of data using filtering
  • flatten multi-dimensional data structures by chaining lateral views
  • handle common errors encountered when reading CSV data
  • identify and troubleshoot missing data
  • identify and work with time-series data
  • identify kinds of masking operations
  • implement a multi-stage aggregation pipeline
  • implement data lakes using AWS
  • implement deep learning using Keras
  • install MongoDB and implement data partitioning using MongoDB
  • list the prominent distributed data models along with their associative implementation benefits
  • list the various frameworks that can be used to process data from data lakes
  • load a few rows of data into a table and query it with simple select statements
  • load multiple sheets from an Excel document
  • perform create, read, update, and delete operations on a MongoDB document
  • perform statistical operations on DataFrames
  • plot pie charts, box plots, and scatter plots using Pandas
  • recall the prominent data pattern implementation in microservices
  • recognize the capabilities of Microsoft machine learning tools
  • recognize the machine learning tools provided by AWS for data analysis
  • recognize the read and write optimizations in MongoDB
  • setup and install Apache Airflow
  • split columns based on a pattern
  • test Airflow tasks using the airflow command line utility
  • trim and clean a DataFrame before a view is created as a precursor to running SQL queries on it
  • use a regular expression to extract data into a new column
  • use a Spark accumulator as a counter
  • use createIndex to build an index on a collection
  • use Maven to create a new project for a MapReduce application and plan out the Map and Reduce phases by examining the auto prices dataset
  • use the alter table statement to change the definition of a Hive table
  • use the find operation to select documents from a collection
  • use the mongoexport tool to export data from MongoDB to JSON and CSV
  • use the mongoimport tool to import from JSON and CSV
  • use the UNION and UNION ALL operations on table data and distinguish between the two
  • work with data in the form of key-value pairs - map data structures in Hive
  • work with scikit-learn to implement machine learning

Overview/Description

Final Exam: Data Wrangler will test your knowledge and application of the topics presented throughout the Data Wrangler track of the Skillsoft Aspire Data Analyst to Data Scientist Journey.



Target

Prerequisites: none

Close Chat Live